Evaluation of the Highest Probability SVM Nearest Neighbor Classifier with Variable Relative Error Cost

نویسندگان

  • Enrico Blanzieri
  • Anton Bryl
چکیده

In this paper we evaluate the performance of the highest probability SVM nearest neighbor (HP-SVM-NN) classifier, which combines the ideas of the SVM and k-NN classifiers, on the task of spam filtering. To classify a sample, the HP-SVM-NN classifier does the following: for each k in a predefined set {k1, ..., kN} it trains an SVM model on k nearest labeled samples, uses this model to classify the given sample, and transforms the output of SVM into posterior probabilities of the classes using sigmoid approximation; than it selects that of the 2×N resulting answers which has the highest probability. The evaluation shows that in terms of ROC curves the algorithm is able outperform pure SVM.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Highest Probability Svm Nearest Neighbor Classifier for Spam Filtering

In this paper we evaluate the performance of the highest probability SVM nearest neighbor classifier, which is a combination of the SVM and k-NN classifiers, on a corpus of email messages. To classify a sample the algorithm performs the following actions: for each k in a predefined set {k1, ..., kN} it trains an SVM model on k nearest labelled samples, and uses this model to classify the given ...

متن کامل

کاربرد الگوریتم‌های داده‌کاوی در تفکیک منابع رسوبی حوزۀ آبخیز نوده گناباد

Introduction: Reduction of sediment supply requires the implementation of soil conservation and sediment control programs in the form of watershed management plans. Sediment control programs require identifying the relative importance of sediment sources, their quantitative ascription and identification of critical areas within the watersheds. The sediment source ascription is involves two...

متن کامل

Non-zero probability of nearest neighbor searching

Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...

متن کامل

Diabetes Prediction by Optimizing the Nearest Neighbor Algorithm Using Genetic Algorithm

Introduction: Diabetes or diabetes mellitus is a metabolic disorder in body when the body does not produce insulin, and produced insulin cannot function normally. The presence of various signs and symptoms of this disease makes it difficult for doctors to diagnose. Data mining allows analysis of patients’ clinical data for medical decision making. The aim of this study was to provide a model fo...

متن کامل

Instance-Based Spam Filtering Using SVM Nearest Neighbor Classifier

In this paper we evaluate an instance-based spam filter based on the SVM nearest neighbor (SVM-NN) classifier, which combines the ideas of SVM and k-nearest neighbor. To label a message the classifier first finds k nearest labeled messages, and then an SVM model is trained on these k samples and used to label the unknown sample. Here we present preliminary results of the comparison of SVM-NN wi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007